Templeton's Features List
The features available in Templeton have been divided into categories
for ease in browsing:
- Mirroring
- Restrictions
- Log files
- Network
- Advanced features
Several options are available when mirroring with Templeton:
- Copying. Templeton retrieves HTML documents, inline images, and
linked files to the local computer system. All links traversed are retrieved,
regardless of file format. Templeton even retrieves some clickable image
maps.
- Link rewriting. HTML documents that are copied have their links
rewritten automatically so that they may be used by local browsers without
requiring internet access. Furthermore, the links are written using
relative file names. This allows for easy file relocation (just move the
entire subtree) and for use without a local WWW server.
- Saving. Templeton stores files in a long file format or DOS FAT
8.3 file format. For DOS based computer systems (including Microsoft
Windows and OS/2 using a FAT file system) the retrieved files
are stored in a truncated 8.3 format. Under operating systems that support
long files names, such as OS/2 using HPFS and Unix, Templeton will store files
with long, descriptive names. You may also
specify using the FAT file format for exporting to DOS based machines.
- File Overwriting. Templeton may be configured to
overwrite existing files from a previous mirror, only retrieve modified files,
or to not process files that exist from a previous run.
- Simple HTML corrections. One of the most type of common errors in
HTML documents
is the (unintentional) omission of quotation marks. Most HTML browsers
forgive this typographical error; Templeton corrects it.
- Link removal. When a hyperlink is not traversed, Templeton can be
configured to either remove the link or leave the untravered link.
- Mapping only. Sometimes it is not desirable to create a mirror
image of a web site. Templeton can be configured to map remote sites, and to
not retrieve files.
- Server Identification. For some tasks, it is helpful if the type
of WWW server is known. Templeton generates a list of WWW server names
and types.
- E-mail lists. Due to popular request, Templeton can generate a
list of all e-mail addresses that it finds. This is useful for
automated mailing lists and contact information.
To prevent unwanted wandering of Templeton across the entire World Wide Web,
the search may be restricted. Templeton supports the following types of
restrictions.
- Host restriction. Templeton may be explicitly told not to traverse
other WWW servers. The restricted server may be listed as any of the
following:
- Current host. Templeton will not traverse links that leave the
initial WWW server. This is the most common type of restriction.
- Subnet. Links within a subnet may be traversed, but WWW servers
outside of the subnet are not visited. This is especially useful when
your school or company maintains a number of WWW servers but you do not
wish to mirror the entire World Wide Web.
- Domain Name. In many cases, a company or school may exist on multiple
subnets, but maintain the same domain name. By restricting to a domain name,
these servers may be mirrored or mapped without traversing the entire World
Wide Web. An example of a restricted domain name is ".intel.com" which
allows only machine names that are in the Intel subnet. This would allow
"www.intel.com" and "gopher.intel.com" but not "www.intel.chips.com" nor the
machine "intel.com".
(These are just examples, not necessarily real machines names.)
- Path Restriction. When restricting to a single WWW server, you
may also wish to restrict to a specific subdirectory on that server. For
example, if you are interested only in the faculty at the Texas A&M
Computer Science Department, then you may wish to restrict to
http://www.cs.tamu.edu/faculty/. HTML documents not within the faculty
subdirectory would not be retrieved.
- Depth Restriction. Templeton processes links in a breadth-first
search pattern. In a breadth-first search, all links from a document are
traversed, then all links from the traversed documents are followed. By
restricting the depth of the search, you limit the number of links to be
followed.
You should be cautious since a breadth-first search may exponentially increase
the number of links to follow at each depth.*
- Robot Exclusion. Applications that search the World Wide Web,
such as Templeton are refered to as Web Robots. Many WWW servers do not
allow web robots to traverse the available information. Why not? Some robots
are not nice and generate so many requests in a short amount of time that the
WWW server slows to a crawl or breaks down. Other web robots try to index (or
mirror or map) proprietary, copyright, or temporary information. Finally
(and most common) some robots become stuck traversing infinite virtual
databases such as Yahoo.com, Tiger Census Maps, or Mud Games.
Templeton supports robot exclusion and can be configured to avoid
restricted paths on a server.
- Custom Restrictions. Templeton can be configured to traverse
(or not traverse) URLs based on user specified criteria. This includes
URLs specifying specific directories or specific file types. Wildcard
characters, representing one or more characters, are permitted.
- Basic Authentication. Templeton supports basic WWW authentication.
Users are prompted for a name and password when accessing protected
documents. Templeton can also read the encoded password from a configuration
file.
Templeton provides a number of log files while it operates:
- Remote Mapping. This log file contains a list of each web page
that was accessed, the links found on each page, and other useful information
such as robot exclusions and unreachable links/hosts. Each web page contains
information about its reference point and how many links you would
need to follow to access this page.
- Local Mapping. Similar to remote mapping, the local map file
tells where each copied file was placed on the local file system.
- Server Identification. This optional log file maintains a list
of servers visited, including the DNS name and type of WWW server
that was found.
- Mailto Listing. This optional log file contains a list of e-mail
addresses that were found in the HTML documents and can be very useful for
generating mailing lists.
These features incorporate network information.
- E-mail address. A good web browser/robot informs each server
"who" is running the software. This is normally your e-mail address. Since
the determined address may not be the "correct" e-mail address,
Templeton allows you to modify this field.
- HTTP Proxy Support. For people who must use a proxy server to
access beyond a firewall, Templeton allows the use of a proxy server.
- Spoof Support. Some web servers refuse to pass data to
"unsupported" browsers. This is usually seen with non-Netscape viewers.
Spoofing allows Templeton to camouflage its name and appear as a different
browser.
Templeton has many features that are considered "advanced."
- Non-interactive Setting. Templeton can operate without user
interaction. This is especially useful for automated retrieval or backups
of web documents.
- System commands.Templeton has the ability to execute other
applications on the retrieved documents.
[Main Menu]
[Option List]
[Configuration]
[Trademarks]
*
Neal's Web Conjecture: Yahoo is reachable from within 8 links of any web page that has links to other machines.
Neal's Other Web Conjecture: You don't want to mirror or map Yahoo.
Document revision: 10 Mar. 1997 for Templeton 1.970
Copyright 1996,1997 N.A. Krawetz
Modification, republication, and redistribution of this
document is strictly prohibited. All rights reserved.